A Native Tensor–Vector Multiplication Algorithm for High Performance Computing

نویسندگان

چکیده

Tensor computations are important mathematical operations for applications that rely on multidimensional data. The tensor–vector multiplication (TVM) is the most memory-bound tensor contraction in this class of operations. This article proposes an open-source TVM algorithm which much simpler and efficient than previous approaches, making it suitable integration popular BLAS libraries available today. Our has been written from scratch features unit-stride memory accesses, cache awareness, mode obliviousness, full vectorization multi-threading as well NUMA awareness non-hierarchically stored dense tensors. Numerical experiments carried out tensors up to order 10 various compilers hardware architectures equipped with traditional DDR high bandwidth (HBM). For large average performance ranges between 62% 76% theoretical systems remains independent mode. On HBM exhibits some dependency but manages reach figures close peak values. Finally, higher-order power method benchmarked proposed kernel delivers 58% 69%

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Performance Matrix Multiplication

This document describes techniques for speeding up matrix multiplication on some high-performance computer architectures, including the IBM RS-6000, the IBM 3090/600S-VF, the MIPS RC3240 and RC6280, the Stardent 3040, and the Sun SPARCstation. The methods illustrate general principles that can be applied to the inner loops of scientific code.

متن کامل

A SIMPLE ALGORITHM FOR COMPUTING TOPOLOGICAL INDICES OF DENDRIMERS

Dendritic macromolecules’ have attracted much attention as organic examples of well-defined nanostructures. These molecules are ideal model systems for studying how physical properties depend on molecular size and architecture. In this paper using a simple result, some GAP programs are prepared to compute Wiener and hyper Wiener indices of dendrimers.

متن کامل

A SIMPLE ALGORITHM FOR COMPUTING DETOUR INDEX OF NANOCLUSTERS

Let G be the chemical graph of a molecule. The matrix D = [dij ] is called the detour matrix of G, if dij is the length of longest path between atoms i and j. The sum of all entries above the main diagonal of D is called the detour index of G. In this paper, a new algorithm for computing the detour index of molecular graphs is presented. We apply our algorithm on copper and silver nanoclusters ...

متن کامل

Task Scheduling Algorithm for High Performance Heterogeneous Distributed Computing Systems

The main objective of task scheduling is to assign tasks onto available processors with the aim of producing minimum schedule length and without violating the precedence constraints. Several algorithms have been proposed for solving task-scheduling problem. The most of them doesn't take into account the average communication of parents and data ready time. In this paper, a new static scheduling...

متن کامل

An Efficient Algorithm for Multiplication Based on Dna Computing

DNA Computing utilizes the properties of DNA for performing the computations. The computations include arithmetic and logical operations such as multiplication. We first show a procedure for multiplication of a pair of two binary numbers. The procedure mainly consist of bit-shift operation where the operation depends on bit position of one's (ignoring zero's) in the multiplicand and finally add...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2022

ISSN: ['1045-9219', '1558-2183', '2161-9883']

DOI: https://doi.org/10.1109/tpds.2022.3153113